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An algorithm is presented for error correction in the surface code quantum memory. This is shown 
to correct depolarizing noise up to a threshold error rate of 18.5%, exceeding previous results and 
coming close to the upper bound of 18.9%. The time complexity of the algorithm is found to be 
sub-exponential with the system size, offering a significant speed-up over brute force methods and 
allowing efficient error correction for codes of realistic sizes. 
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Introduction: Topological error correcting codes, and 
the topological quantum computation that they may be 
used for, have attracted wide attention in recent years [T]- 
[S]. As such, it is important to determine the threshold 
error rates for realistic error models. The most stud- 
ied, and most realistic topological error correcting codes 
are the surface codes [U [2] , and the most realistic error 
model that is well-studied is that of depolarizing noise. 
The application of this noise model to a surface code 
induces correlations between different kinds of topologi- 
cal defects. Thus far, thresholds for error correction al- 
gorithms have only been able to saturate the bound of 
16.5%, the upper bound achievable when the correlations 
are ignored [BHS] . Here we present an algorithm that sur- 
passes this bound, running in sub-exponential time. A 
threshold of 18.5% is found, falling only a little short of 
the recently estabhshed 18.9% hmit 

The planar code: The algorithm presented below is 
designed to correct errors in the the planar code, the 
planar variant of Kitaev's surface codes ID [2]. The code 
is defined on the spin lattice of Fig. [l] where a spin-1/2 
particle is placed on each vertex. The following Hermi- 
tian operators are then defined around each plaquette of 
the lattice, 

A^l[af, B, = Y[a!. (1) 

These operators determine the anyonic occupation of 
their corresponding plaquettes, with so-called flux anyons 
on the p-plaquettes and charge anyons on the s- 
plaquettes. Since the operators mutually commute, they 
also form the stabilizers of a stabilizer code. The any- 
onic vacuum is the corresponding stabilizer space and 
the anyon configuration is the syndrome. The code can 
store a single qubit, whose state is determined by the 
anyonic occupations of the edges. The X (Z) basis of 
the stored qubit may be chosen such that the | (| 0)) 
state corresponds to the vacuum on the top (left) edge 
and I — ) (I 1)) corresponds to a flux (charge) anyon. The 
effect of errors on the spins is to create and move anyons, 
causing logical errors when they are moved off the edges. 

Depolarizing noise: The error model considered in this 
study is that of single qubit depolarizing noise. This is 




FIG. 1. The spin lattice on which an L x L planar code is 
defined, with s-plaquettes shown in blue and p-plaquettes in 
white. A spin-1/2 particle resides on each vertex. The linear 
size L is characterized by the number of s-plaquettes along 
each side, with L = 4 in this case. 



characterized by an error rate, p, which is taken to be 
equal for all spins. The probability that no error occurs 
on a spin is 1 — p. Otherwise, a tr^, or error is 
applied, each with probability p/3. Such noise therefore 
takes an arbitrary single qubit state p and transforms it 
to, 

Dpip)^il-p)p+ J2 (p/3)a>a". (2) 

a—x,y,z 

In the planar code, such noise results in correlations 
between the configurations of charge and flux anyons. 
Should these be ignored, error correction can be achieved 
so long as the probability that either a cr^ or a error 
occurs (or equivalently a cr^ or a cr^ error) is less than 
around 11% [5]. This gives a threshold of Pc « 16.5%. 
If the correlations are taken into account, the threshold 
increases to Pc « 18.9% OHO]. 

Error correction: Suppose a planar code, initially pre- 
pared in a state of the stabilizer space, is subject to de- 
polarizing noise with a known rate p. The first step in 
error correction is to measure all stabilizers, and hence 
determine which plaquettes hold an anyon. We will as- 
sume that this can be done perfectly. The next step is to 
use the knowledge of the anyon configuration to deter- 
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mine which set of logical errors are most likely to have 
occurred. 

Let us use e to denote a configuration of errors, which 
records whether 1, a^^ Uy or tr^ has occurred on each 
physical spin. Let us also use A to denote a configuration 
of anyons and E to denote the logical error (1, X, Y or 
Z) that has occurred on the encoded qubit. Each e is 
consistent with a unique A and a unique E, so let us also 
use A and E to denote the set of e consistent with the 
anyon configuration A and logical error iJ, respectively. 
Given an anyon configuration A after measurement of the 
stabilizers, the probability for each logical error is, 

P{E\A)^ J2 Pie\A). (3) 

eGAnE 

Here P{e\A) is the probability that the error configura- 
tion e occurred given that the anyon configuration is A, 
etc. It can then be assumed that whichever E is most 
likely is that which occurred, and error correction can be 
performed accordingly. For any p < pc, this error cor- 
rection procedure succeeds with a probability that tends 
to unity as L — > oo. For p > Pc the success probabil- 
ity tends to 1/2 in this limit, making error correction no 
better than guessing. 

Note that P{e\A) can be related to the unconditioned 
probabilities of e and A by P(e|A) = P{e)/P{A). Since 
P{A) is a common factor for all E, it does not need to be 
calculated in order to determine which of the P{E\A) is 
greater, and hence which E is most likely. The P{e) may 
be calculated easily. For depolarizing noise P{e) = (1 — 
p)"'^""" (p/S)"" , where rig is the number of spins on which 
a. ax, <Jy or CTz has occurred on the error configuration 
e. The number of error configurations consistent with 
any anyon configuration is 2^, where N = 2L^ — 1 is the 
total number of plaquettes in the code. Calculating the 
P{E\A) using a brute force approach will therefore take 
a time that is exponential with the system size. In fact, 
the scaling of this is so bad that no existent computer 
could correct an L = 11 planar code in less than the 
age of the universe. As such, approximate methods are 
used to determine the most likely logical error for any 
anyon configuration. These achieve thresholds that are 
lower than the ideal case, but run for realistic time-scales 

[SHE]. 

The algorithm presented here uses a Markov chain 
Monte Carlo method to sample error configurations from 
the distribution P(e\A). By taking many such samples, 
the probabilities P{E\A) may then be approximated and 
hence the most likely logical error found. The most 
straightforward way to carry out this procedure, given an 
anyon configuration A, is using the Metropolis method as 
follows [TT] . First a pattern of errors cq G A is generated 
randomly. This can be done in 0{LF') time by first plac- 
ing errors such that all anyons are connected, and then 
randomly applying each of the stabilizer. The first step 
ensures that eo is within A. The second ensures that it is 



random, since application of stabilizers deforms the error 
configuration without changing the anyon configuration. 
Once Co is generated, it can be used to generate a second 
configuration, ei S A. To do this, a random change is 
made to eo to create a configuration Cq G A. The ratio, 

P{eo\A) \l-pj 

is then determined. If r-(e,e') > 1, we set ei = Cg. If 
r(e,e') > 1, we set ei — c'q with probability r(e, e') and 
ei — eo with probability 1 — r(e, e'). This process then 
continues until the sequence of e„ converges, at which 
point they will be generated according to the distribution 
Pie\A) m- 

The most intuitive method that could be used to gen- 
erate each e^ from each e„ is to randomly pick a stabi- 
lizer and apply it. This will cause an 0(1) change in the 
number of errors and hence yield an r(e„,e^) of 0(1). 
However, only making such changes means that only er- 
ror configurations corresponding to the same E as eo will 
be generated. As such, additional changes in which logi- 
cal operators spanning the code can be randomly applied 
must also be made, such that configurations from all E 
are sampled from. However, these will add 0{L) errors 
to any configuration on which they are applied, resulting 
in r(e„,eJJ = 0{exp —L). Since the acceptance of such 
changes is exponentially small, the time taken to conver- 
gence will be at least 0(exp L). Some additional methods 
are therefore required to avoid this source of inefficiency. 

A solution to the problem is to use parallel temper- 
ing [12]. For this, many Markov chains such as that 
described above are run in parallel. Let us use Nc to 
denote the number of such chains, and restrict it to be- 
ing odd. The first chain (which we will refer to as the 
bottom chain) works exactly as described above. Each 
e^ is generated from e„ by application of a random sta- 
bilizer. No logical operators are applied to change the 
value of E. The second chain works in the same way, 
except for a difference in the calculation of the r(e„, ej^). 
Instead of using the error rate p when calculating the 
P(e), a slightly higher error rate p2 = p + A is used, 
where A = (0.5 — p)/{Nc — 1)- Similarly the mth chain 
will use an error rate of Pm — P + (™ — 1) A. Using this 
prescription, the A^cth chain (which we will refer to as 
the top chain) has p^^ = 0.5, and so 7'(e„,e^) = 1 in all 
cases. As such, we need not restrict each e^ for this chain 
to be only an 0(1) change away from e„. Accordingly, 
the are generated randomly and independently from 
the e„ by randomly applying all stabilizers and logical 
operators. It is therefore in the top chain, and only the 
top chain, where the value of E changes. 

The randomness in E generated in the top chain is in- 
troduced to the rest of them as follows. After running 
each chain for a certain number of iterations, swaps be- 
tween neighbouring chains are attempted. For a swap 



3 



between chains m and m + 1, the ratio 

is calculated. Here denotes the current state of the 
mth chain, etc. This is a straightforward generalization 
of Eq. [4] to the state of two chains rather than one, 
where the proposed change is the swap of states. If 
r(e™,e™+^) > 1, the configuration e™ is set as the new 
state of the m+ 1th chain, and vice- versa. Otherwise this 
is done with probability r(e'", e™"'"^) and the chains are 
left alone with probability 1— r(e'", e™"'"^). The Metropo- 
lis process is then again run on each chain for a number 
of iterations before a further break in which swaps are 
attempted, continuing until convergence. Henceforth we 
will refer to a certain number of Metropolis iterations 
followed by a break to attempt swaps as a 'step' of the 
algorithm. 

In order for the states of high chains to be able to mi- 
grate down to the bottom in a time faster than 0(exp L), 
it must be ensured that the r(e'", e"*"*"^) do not decay 
with system size. Since a system of side length L has 
2L^ physical spins, and since the number of errors in 
any chain should be proportional to its error probability, 
we see that the difference in the number of errors for two 
neighbouring chains is n^m+i —Ue^ = O(L^A). Also, if A 
issmaU, ln([p„/pm+i][(l-p,„+i)/(l-p™)]) = 0(A). As 
such r(e'",e™+i) = 0{exp[L^ A^]), and so A = 0{L-^) 
will lead to r(e™, e™"'""'^) = 0(1). In order to achieve this 
Nc — 0{L) chains are used. The numerical simulations 
confirm that this leads to ^(e™, e™"*"^) that do not decay 
with system size. 

The total number of unique samples originating in 
the top chain that have filtered down to the bottom 
is counted throughout the process as a measure of its 
progress. This number is denoted topsO. Convergence 
is tested for by a variant of the Geweke diagnostic [13] . 
To do this the number of errors present in the first chain 
are recorded at the end of each step. Averages are then 
made over the second and fourth quarters of this data 
and these are compared. If the process has converged, 
these averages should be equal. As such, if the averages 
remain within a tolerance of e of each other for a certain 
number of steps, the process is taken to be converged. 
This number of steps is taken to be that required for 
topsO to increase by an amount SEQ. To reduce serial 
correlations, and ensure that states from all chains have 
had a chance to migrate to the bottom, the comparison 
between the averages is not made until topsO has reached 
a value of TOPS. The values of E are recorded during the 
period over which the averages remain within e. The log- 
ical error in the majority over all these is then taken to 
be the most likely. 

The above tests for convergence of the process to its 
stationary distribution, P{e\A). However, this is not nec- 
essarily required in order to determine which of the logi- 



cal errors is most likely. As such, a test which determines 
when the most likely logical error has been decided should 
lead to a lower runtime. To do this, the value of E is 
recorded at the end of each step and the majority values 
for the second and fourth quarters of this data are de- 
termined. If these remain equal for the number of steps 
required for topsO to increase by SEQ, their shared value 
is taken to be the most likely value of E. As before, 
to reduce serial correlations, the comparison between the 
averages is not made until topsO = TOPS. Also the values 
of E are not recorded until topsO = 1. Implementations 
of the algorithm using the first and second convergence 
tests described here will be referred to as the first and 
second variants of the algorithm. 

Results: The task of an error correcting code is to 
reduce the logical error rate, P, to some desired value. 
The resources required for this task are the number of 
spins that must be used, and the time that is taken to 
decode the information at readout. In order for an error 
correction algorithm to be called efficient, it must be able 
to obtain any given P with both a system size and run- 
time of 0(polyP~^). Note that it is this scaling of the 
run-time of the algorithm with logical error rate that is of 
primary importance, the scaling of run-time with system 
size is only a secondary concern. In fact, since the planar 
code is theoretically capable of obtaining any given P 
with system size L = 0(log P^^) (as long as the spin 
error rate is below threshold), the correction algorithm 
could have a run-time of 0(exp L) and still be called 
efficient. 

In Fig. [2] (a) and (d) the logical bit flip error rate, P, 
is plotted against system size for a range of single spin 
error rates, p. If p is under (over) the threshold value Pc 
the logical error rate will decrease (increase) with system 
size. From the results, it is evident that Pc ~ 0.185 for 
the first variant and Pc ~ 0.18 for the second. These val- 
ues fall slightly short of Pc = 0.189, the value that would 
be achieved by a brute force method O [10] . Theoreti- 
cally the algorithms should achieve the maximum value 
as e — ^ or SEQ — )■ 00. However, the runtime required for 
this will be prohibitive. The logical error rates decrease 
quickly for increasing L, showing that the efficiency con- 
dition L = ©(polyP"-"^) is satisfied in both cases. In fact 
the results suggest that P is suppressed exponentially 
with L, especially for the first variant, leading to a very 
efficient L = 0[\ogP-^). 

In Fig. [2] (b) and (e) the logarithm of the number of 
steps required by the algorithm before convergence, T, 
is plotted against system size for p — 0.17. Since each 
step requires O(logi) actions on Nc — 0{L) chains, the 
total time complexity of the process is 0(TL log L). The 
results show that the time complexity is sub-exponential 
with L. In order to determine more information about 
this scaling. In In T/ In In is plotted against N in Fig. 
[2](c) and (f). Conver gence of In In T/ In In A^ to a value 
c as A^ — >■ cx) means that T = 0(e(i"^)'). A value of 
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FIG. 2. Results for the first variant of tlie algoritlim are 
presented in (a), (b) and (c), on the left. Those for the second 
are in (d), (e) and (f), on the right. Both were run using the 
nearest odd integer to L for Nc, ten metropolis iterations per 
step, e = 0.1 and TOPS = 10. The first used SEq = 2 and the 
second used SEQ = 10. (a) and (d) show plots of the logical 
bit flip error rate after correction, P, against linear system 
size, L. Each point was averaged over 10* samples. Fit lines 
for the threshold values are shown as a guide to the eye. (b) 
and (e) show plots of T, the number of steps required by 
the algorithm before convergence, against L. Since only the 
order of magnitude of the run-time is important, the number 
of samples averaged over was reduced to between 10 and 100, 
allowing higher system sizes to be probed, (c) and (f) show 
plots of In In T/ In In TV against A*' for the same data points as 
(b) and (e). 



c < 1 corresponds to sub-linear scaling with A'', c = 1 to 
polynomial scaling and c > 1 to quasi-polynomial scaling. 
The results suggest convergence to a value of c 1.4 
for the first variant of the algorithm. This implies that 
the time complexity of the algorithm is at least quasi- 
polynomial with the system size. The second variant 
does not converge within the system sizes considered, and 
goes below the c ~ 1.4 convergence point of the first 
variant. This implies that it has a different and more 
efficient scaling than the first variant, balancing the drop 
in threshold with an increase in efficiency. Regardless of 



the exact time complexity with respect to system size, 
combining the sub-exponential scaling of T with L with 
the apparent exponential suppression of P with L implies 
an efficient scaling of T = 0(polyP~^). 

It is also important to determine the effectiveness of 
the algorithm for low error rates, for which the logical 
error rates should scale as 0(pL(''+i)/2J ) for small p |14j . 
The distance, d of the planar code is L -I- 1 for bit flip 
errors and L for phase flip errors. Numerical simulations 
for the performance of codes from L = 2 to L = 4 for 
error rates from 0.5% to 3% show a good fit to such 
scaling, implying that this algorithm does indeed allow 
the code to utilize its full distance. 

Conclusions: We have presented an algorithm for error 
correction in the planar code, which runs in a time that 
is sub-exponential with the system size, and which pro- 
vides efficient suppression of logical errors. It is demon- 
strated that this achieves thresholds higher than existing 
algorithms, approaching the theoretical bounds. Future 
work will be dedicated to further development of the the 
algorithm, to increase the threshold while decreasing the 
time complexity. Also, application of the algorithm to 
the case of noisy stabilizer measurements is sure to yield 
important results for this physically realistic case. 
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